Many models are too complex for quadratic approximation or other short cuts.
We need a general method of numeric integration to explore the posterior effectivly.
MCMC \(\lim_{t\to\infty}\) posterior
Monte Carlo: “repeated random sampling to obtain numerical results” -wikipedia
Markov chain: stochastic model in which state at time \(t\) depends only on previous state at \(t-1\)
Imagine this one-dimensional posterior
Imagine this one-dimensional posterior:
Imagine we take a guess for \(\theta\) This is the first step of our (Markov) chain
We can then find the height for this guess
height = likelihood of data given \(\theta\) \(\times\) prior of \(\theta\)
= \(\text{Pr}(x|\theta) \times \text{Pr}(\theta)\).
Choose a proposal for a new value on the chain. - Our proposal distribution is centered on our last guess. - Can be any symmetric shape.
Find height (posterior-ish value) for our proposal
Calculate the ratio of those heights: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta)\text{Pr}(\theta)} = \frac{0.078}{0.07} = 1.124 \]
We want to calculate the ratio of posterior probabilities:
\[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t-1} | x)} = \frac{\frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x)}}{\frac{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}{\text{Pr}(x)}} \] but we do not know (and cannot calculate) \(\text{Pr}(x)\).
However, they cancel out! We just need the numerators! \[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t-1} | x)} = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta)\text{Pr}(\theta)} \] ## Step 3 {.smaller}
Choose a random uniform variable, \(U \sim \text{Uniform}(0,1)\)
If \(r > U\), accept the proposal & add it to the chain
\[ r = \frac{0.061}{0.078} = 0.781 ; U = 0.89 \]
Since \(r < U\), reject proposal & keep the current position.
\[ r = \frac{0.079}{0.078} = 1.004 \] \(U=0.212\)
Accept proposal & add to chain
It works!
This is the Metropolis Algorithm
What if we get proposals outside of possible boundaries?
However, proposal distribution is \(a\)symmetric:
\[ \text{Pr}(\theta_{t-1} \rightarrow \theta_p) \neq \text{Pr}(\theta_{p} \rightarrow \theta_{t-1}) \]
Adjust the ratio to: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}\times \frac{\text{Pr}(\theta_{t-1}|\theta_p)}{\text{Pr}(\theta_p|\theta_{t-1})} \]
and it works!
Imagine a proposed jump from one of these points could easily be far from posterior density
Hamiltonian Monte Carlo
First, take the negative log, so high probability = low areas
First, take the negative log, so high probability = low areas
Take the first guess
…and give it a bit of momentum (in a direction of parameter space)
Track it’s movement for a certain amount of “time” using Hamiltonian equations
\[ \text{Total Energy} = \text{Kinetic} + \text{Momentum} \]
After a bit of time, point at end is the new proposal
No analytic solution to Hamiltonian for most problems, so:
No analytic solution to Hamiltonian for most problems, so:
Overall, each proposal requires many more calculations (many steps, calculating kinetic energy & momentum at each), but proposals are much better / rarely rejected, so much more efficient overall